Memory and Computation Efficient PCA via Very Sparse Random Projections

نویسندگان

  • Farhad Pourkamali Anaraki
  • Shannon M. Hughes
چکیده

Algorithms that can efficiently recover principal components in very high-dimensional, streaming, and/or distributed data settings have become an important topic in the literature. In this paper, we propose an approach to principal component estimation that utilizes projections onto very sparse random vectors with Bernoulli-generated nonzero entries. Indeed, our approach is simultaneously efficient in memory/storage space, efficient in computation, and produces accurate PC estimates, while also allowing for rigorous theoretical performance analysis. Moreover, one can tune the sparsity of the random vectors deliberately to achieve a desired point on the tradeoffs between memory, computation, and accuracy. We rigorously characterize these tradeoffs and provide statistical performance guarantees. In addition to these very sparse random vectors, our analysis also applies to more general random projections. We present experimental results demonstrating that this approach allows for simultaneously achieving a substantial reduction of the computational complexity and memory/storage space, with little loss in accuracy, particularly for very high-dimensional data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

OR-PCA with MRF for Robust Foreground Detection in Highly Dynamic Backgrounds

Accurate and efficient foreground detection is an important task in video surveillance system. The task becomes more critical when the background scene shows more variations, such as water surface, waving trees, varying illumination conditions, etc. Recently, Robust Principal Components Analysis (RPCA) shows a very nice framework for moving object detection. The background sequence is modeled b...

متن کامل

Learning from High Dimensional fMRI Data using Random Projections

The term “the Curse of Dimensionality” refers to the difficulty of organizing and applying machine learning to data in a very high dimensional space. The reason for this difficulty is that as the dimensionality increases, the volume between different training examples increases rapidly and the data becomes sparse and difficult to classify. So, the predictive power of a machine learning algorith...

متن کامل

Very Sparse Stable Random Projections, Estimators and Tail Bounds for Stable Random Projections

The method of stable random projections [39, 41] is popular for data streaming computations, data mining, and machine learning. For example, in data streaming, stable random projections offer a unified, efficient, and elegant methodology for approximating the lα norm of a single data stream, or the lα distance between a pair of streams, for any 0 < α ≤ 2. [18] and [20] applied stable random pro...

متن کامل

Sparse Principal Component Analysis via Regularized Low Rank Matrix Approximation

Principal component analysis (PCA) is a widely used tool for data analysis and dimension reduction in applications throughout science and engineering. However, the principal components (PCs) can sometimes be difficult to interpret, because they are linear combinations of all the original variables. To facilitate interpretation, sparse PCA produces modified PCs with sparse loadings, i.e. loading...

متن کامل

Learning Compressed Sensing

Compressed sensing [7], [6] is a recent set of mathematical results showing that sparse signals can be exactly reconstructed from a small number of linear measurements. Interestingly, for ideal sparse signals with no measurement noise, random measurements allow perfect reconstruction while measurements based on principal component analysis (PCA) or independent component analysis (ICA) do not. A...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014